Language and Text-Independent Speaker Identification System Using GMM
نویسندگان
چکیده
This paper motivates the use of Dynamic Mel-Frequency Cepstral Coefficient (DMFCC) feature and combination of DMFCC and MFCC features for robust language and text-independent speaker identification. MFCC feature, modeled on the human auditory system has been the widely used feature for speaker recognition because of its less vulnerability to noise perturbation and little session variability. But the human auditory system also can sensitively perceive the pitch changes in the speech. Therefore adopting the algorithm which integrates the change in speaker specific pitch information in designing the Dynamic Mel scale filter bank exhibit improved effectiveness in speaker identification. The individual Gaussian component of Gaussian Mixture Model (GMM) represents vocal tract configurations that are effective for speaker identification. The performance of the speaker identification system is experimentally evaluated with microphone speech data base consisting of 120 speakers. The experiments examine the speaker Identification Error Rate (IDER) by testing using segments of different lengths and also using text-independent utterances in Tamil and English languages. In comparison with the identification error rate of 5.8% obtained with MFCCbased system and 2.9% with DMFCC system an error rate of 1.2% is obtained when DMFCC feature vectors are added with MFCC feature vectors to form the combined feature. Experimental results confirm that GMM is efficient for language and text – independent speaker identification. Key-Words: Speaker Identification, Melscale filter bank, Gaussian filters, Mel Frequency Cepstral Coefficient, Dynamic Mel Frequency Cepstral Coefficient, Gaussian Mixture Model.
منابع مشابه
Performance Analysis of Speaker Identification System Using GMM with VQ
Personal identity identification is an important requirement for controlling access to protected resources. Biometric identification by using certain features of a person is a more secured solution for security identification. Advances in speech processing technology and digital signal processors have made possible the design of high-performance and practical speaker recognition systems. A more...
متن کاملRobust Text Independent Speaker Identification Using Hybrid GMM-SVM System
This paper introduces and motivates the use of the statistical method Gaussian Mixture Model (GMM) and Support Vector Machines (SVM) for robust textindependent speaker identification. Features are extracted from the dialect DR1 of the Timit corpus. They are presented by MFCC, energy, Delta and Delta-Delta coefficients. GMM is used to model the feature extractor of the input speech signal and SV...
متن کاملConstructing the Discriminative Kernels Using GMM for Text-Independent Speaker Identification
In this paper, a class of GMM-based discriminative kernels is proposed for speaker identification. We map an utterance vector into a matrix by finding the sequence of components, which have the maximum likelihood in the GMM for the all frame vectors. And the weights matrix was used, which were got by the GMM's parameters. Then the SVMs are used for classification. A one-versus-rest fashion is u...
متن کاملText Independent Speaker Modeling and Identification Based On MFCC Features
In this gives an overview of automatic speaker recognition technology, with an emphasis on textindependent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the state-of-the-art methods. We start with the fundamentals of automatic speaker recognition, concerning feature extraction and speaker modeling. Here, describe a ...
متن کاملText-independent speaker recognition by speaker-specific GMM and speaker adapted syllable-based HMM
We present a new text-independent speaker recognition method by combining speaker-specific Gaussian Mixture Model(GMM) with syllable-based HMM adapted by MLLR or MAP. The robustness of this speaker recognition method for speaking style’s change was evaluated. The speaker identification experiment using NTT database which consists of sentences data uttered at three speed modes (normal, fast and ...
متن کامل